Computational identification of host genomic biomarkers highlighting their functions, pathways and regulators that influence SARS-CoV-2 infections and drug repurposing

The pandemic threat of COVID-19 has severely destroyed human life as well as the economy around the world. Although, the vaccination has reduced the outspread, but people are still suffering due to the unstable RNA sequence patterns of SARS-CoV-2 which demands supplementary drugs. To explore novel drug target proteins, in this study, a transcriptomics RNA-Seq data generated from SARS-CoV-2 infection and control samples were analyzed. We identified 109 differentially expressed genes (DEGs) that were utilized to identify 10 hub-genes/proteins (TLR2, USP53, GUCY1A2, SNRPD2, NEDD9, IGF2, CXCL2, KLF6, PAG1 and ZFP36) by the protein–protein interaction (PPI) network analysis. The GO functional and KEGG pathway enrichment analyses of hub-DEGs revealed some important functions and signaling pathways that are significantly associated with SARS-CoV-2 infections. The interaction network analysis identified 5 TFs proteins and 6 miRNAs as the key regulators of hub-DEGs. Considering 10 hub-proteins and 5 key TFs-proteins as drug target receptors, we performed their docking analysis with the SARS-CoV-2 3CL protease-guided top listed 90 FDA approved drugs. We found Torin-2, Rapamycin, Radotinib, Ivermectin, Thiostrepton, Tacrolimus and Daclatasvir as the top ranked seven candidate drugs. We investigated their resistance performance against the already published COVID-19 causing top-ranked 11 independent and 8 protonated receptor proteins by molecular docking analysis and found their strong binding affinities, which indicates that the proposed drugs are effective against the state-of-the-arts alternatives independent receptor proteins also. Finally, we investigated the stability of top three drugs (Torin-2, Rapamycin and Radotinib) by using 100 ns MD-based MM-PBSA simulations with the two top-ranked proposed receptors (TLR2, USP53) and independent receptors (IRF7, STAT1), and observed their stable performance. Therefore, the proposed drugs might play a vital role for the treatment against different variants of SARS-CoV-2 infections.

The gene regulatory network (GRN) analysis of hub-DEGs. Transcription factors (TFs) and micro-RNAs (miRNAs) are considered as the most important transcriptional and post transcriptional molecular regulatory factors of genes. We constructed undirected interaction network between TFs and hub-DEGs to explore key regulatory transcriptional factors of hub-DEGs (Fig. 5). In this network, hub-DEGs were represented by round nodes with red color and TFs were represented by square nodes with blue color, where larger numbers of connectivity were represented by the larger nodes. We selected FOXC1, GATA2, SRF, FOXL1, YY1 and NFIC as the top 6 key regulatory TFs of hub-DEGs based on higher degree of topological measures.
To explore key regulatory post-transcriptional factors of hub-DEGs, we constructed undirected interaction network between miRNAs and hub-DEGs (Fig. 6). In this network, hub-DEGs were represented by round nodes with red color and miRNAs were represented by octagonal nodes with green color, where larger numbers of   Table S2 show the disease versus hub-DEGs interaction network analysis results. We observed that IGF2 gene is associated with 120 diseases including Cardiovascular Diseases, Colorectal Neoplasms, Cardiomyopathies, Liver carcinoma, Anemia; the CXCL2 gene is associated with 19 diseases including rheumatoid arthritis, heart failure, hypertensive disease, IBN inflammation, pulmonary fibrosis, acute lung injury while the ZFP36, KLF6, GUCY1A2 and PAG1 was linked with 18 diseases including liver cirrhosis, experimental prostatic neoplasms, stomach carcinoma, inflammation, arthritis, especially which could be a severe comorbidities of the COVID-19 patients. To assess the association of hub-DEGs with lung cancer, we performed multivariate survival analysis of lung cancer patients with expressions of hub-DEGs. The log-rank test was used to test the significant difference between two survivals curves (Fig. 7b) based on all hub-DEGs. We observed the significant difference between the low and high-risk group in survival probability, which indicates that the hub proteins are significantly associated with lung cancer.
Drug repurposing by molecular docking study. We considered 10 hub-proteins corresponding to our proposed 10 hub-DEGs and their regulatory 5 key TFs-proteins (FOXC1, GATA2, SRF, FOXL1 and YY1) as the m = 15 drug target receptors. The SARS-CoV-2 3CL protease-guided top-ranked n = 90 drugs out of 3410 FDA approved antiviral drugs 50 , were considered as the drug agents that were mentioned previously in the materials and method section. We downloaded 3D structure of 7 hub proteins (SNRPD2, ZFP36, NEDD9, IGF2, TLR2, and CXCL2) from Protein Data Bank (PDB) 51 with source codes 5xjs, 4j8s, 2l81, 1igl, 1fyw, and 5ob5, respectively. The 3D structure of 3 hub proteins (KLF6, USP53, GUCY1A2, and PAG1) were downloaded from SWISS-MODEL 52 using UniProt 53 ID of Q99612, Q70EK8, P33402, and Q9NWQ8, respectively. The 3D structure of 3 TFs proteins (GATA2, SRF and YY1) were downloaded from PDB with source codes 5o9b, 1hbx, and 1ubd, respectively, and the rest 2 TFs proteins (FOXC1 and FOXL1) were downloaded from SWISS-MODEL using UniProt ID of Q12948 and Q12952. The 3D structures of 90 FDA-approved drugs were downloaded from PubChem database 54 . Then we performed molecular docking study to calculate binding affinity scores for each pair of receptors and drug agents. Then we ordered the receptor proteins according to the descending order of row sums of the binding affinity score matrix A = (A ij ) and drug agents according to the column sums to select the top-ranked few drug agents as the candidate drugs. Figure 8a displayed the image of binding affinity matrix A * = A * ij corresponding to the ordered target proteins in Y-axis and top-ordered 50 drug agents out of 90 in www.nature.com/scientificreports/ X-axis. We observed that the first two top lead compounds (lead1: Torin-2 and lead2: Rapamycin) produce binding affinity scores less than or equal to − 8.0 kcal/mol with all of our proposed 15 receptor proteins. The next (3-7)th top lead compounds (lead3:Radotinib, lead4:Ivermectin, lead5:Thiostrepton, lead6:Tacrolimus and lead7:Daclatasvi) produced binding affinity scores less than or equal to − 8.0 kcal/mol with our proposed 13 receptor proteins out of 15. The rest 83 lead compounds produced binding affinity scores less than or equal to − 8.0 kcal/mol with the fewer number of receptors. Therefore, we considered the top 7 lead compounds (Torin-2, Rapamycin, Radotinib, Ivermectin, Thiostrepton, Tacrolimus and Daclatasvi) as the most probable candidate drugs for SARS-CoV-2 infections. To validate our proposed 7 candidate drugs by molecular docking study with already published independent receptor proteins (genomic biomarkers) associated with SARS-CoV-2 infections available in the literature, we reviewed 22 published articles associated with SARS-CoV-2 infections those provided transcriptome guided hubproteins (genomic biomarkers). We found total of 193 hub-proteins published in those 22 articles, but there was no hub-protein commonly published in all articles (Table 4). We found 11 hub-proteins (ICAM1, IRF7, MX1, NFKBIA, STAT1, IL6, TNF, CCL20, CXCL8, VEGFA, and CASP3) which were commonly reported in at most 3 articles out of 22 (Table 4). We considered these 11 hub-proteins as the publicly available top ranked receptor proteins associated with SARS-CoV-2 infections to validate the proposed repurposed drugs by molecular docking. The 3D structures of these 11 (ICAM1, IRF7, MX1, NFKBIA, STAT1, IL6, TNF, CCL20, CXCL8, VEGFA, and CASP3) proteins were retrieved from Protein Data Bank (PDB) with codes 5mza, 2o61, 3szr, 1nfi, 1bf5, 1il6, 1tnf, 2jyo, 1ikl, 1cz8, and 1gfw, respectively. Then molecular docking interactions of our proposed drugs with the publicly available top ranked receptor proteins were performed. Their binding affinities (kcal/mol) were visualized in Fig. 8b. We observed that their binding affinities ranged between (− 12.1 to − 7) kcal/mol and average binding affinities were less than or equal to − 9.5 kcal/mol which indicates the strong binding capacity. Then we compared the docking results of top-ranked eight receptor proteins (four from the proposed set and the other four from the published set) with their protonation state at their physical conditions of salinity = 0.15, internal dielectric = 10, pH = 7, and external dielectric = 80 (see Supplementary File S4) 55 . The docking analysis showed the significant binding affinities ranged between (− 11 to − 7.7) kcal/mol with the protonated receptors ( Fig. 8c). We observed that both original and protonated (*) receptors produce almost similar binding scores, which indicate the proposed drugs might be effective against the protonated receptors also. Table 5, Supplementary Files S3 and S4 show the summary results of interacting properties of our target proteins with top-ranked drugs (lead compounds) that produced the highest binding affinity scores. The 3D structure of their interacting complex is shown in the 4th column of Table 5. The 2D schematic diagram of these 3 target proteins with mentioned candidate drugs interaction is given in the 5th column highlighting their www.nature.com/scientificreports/ neighbor residues (within 4 Å of the drug). Key interactions amino acids and their binding with potential targets were shown in the last column.

Molecular dynamic (MD) simulations. Among the proposed candidate drugs, Torin-2, Rapamycin and
Radotinib were the top ranked three candidate drugs (Table 5). Therefore, these top three drug agents were selected for their stability analysis through 100 ns MD-based MM-PBSA simulations. Fig. 9, we observed that all the six systems are significantly stable between the variations of moving and initial drug-target complexes. We calculated their RMSD (root mean square deviation). Figures 9a,b, represents the RMSD corresponding to the proposed receptors (TLR2, USP53) and independent receptors (IRF7, STAT1), respectively. All the systems projected the RMSD around 2 to 4 Å except USP53_Radotinib complex which showed the RMSD around 3.5 to 6.5 Å. The average RMSD for TLR2_Torin-2, TLR2_Rapamycin and USP53_Radotinib complexes were 3.3 Å, 3.7 Å and 4.7 Å, respectively. TLR2_Torin-2 complex showed slight fluctuation in Cα backbone around 40,000 ps and was stabilized in the remaining simulation. Similarly, a streak of continuous fluctuation was found in the TLR2_Rapamycin complex, ranging from 30,000 to 56,000 ps, followed by inconsiderable change. For USP53_radotinib complex, a negligible fluctuation was observed in the starting 12,000 ps and around 62,000 ps to 80,000 ps and remained stable thereafter throughout the simulation. On the other hand, the average RMSD was found to be 3.3 Å for IRF7_Torin-2 complex with slight fluctuation in Cα backbone around 12,000 to 16,000 ps and was stabilized in the remaining simulation. For STAT1_Rapamycin complex, the average RMSD was found to be 2.9 Å. The average RMSD for the STAT1_radotinib bound complex was 2.4 Å, with an overall RMSD of approximately 2.6 Å, indicating that it was comparably more stable among the six selected systems. However, the data indicates that all the systems showed stable internal motion.

Discussion
The current study analyzes the high throughput RNA-Seq data to identify key genomic biomarkers (hub DEGs/ proteins) highlighting their GO terms and KEGG pathways, key regulatory components (TFs and miRNAs), associated comorbidities and repurposable drugs for the treatment against SARS-CoV-2 infections by using the integrated bioinformatics approaches that were summarized in Fig. 1. Totally 109 DEGs were identified between SARS-CoV-2 infected and control samples; among them 16 upregulated and 91 down regulated genes (Table 2) were finally reported. Among them 107 DEGs encoded proteins were used to construct the PPI network (Fig. 3)  TP53, HRAS, CTNNB1, FYN, ABL1, STAT3, STAT1,  JAK2, C1QBP, XBP1, BST2, CD99, IFI35, MAPK11,  RELA, LCK, KIT, EGR1, IL20, ILF3, CASP3, IL19 www.nature.com/scientificreports/  www.nature.com/scientificreports/ which revealed ten hub-DEGs/proteins (SNRPD2, ZFP36, NEDD9, KLF6, USP53, IGF2, TLR2, PAG1, GUCY1A2 and CXCL2) which are considered as the key genomic biomarkers for SARS-CoV-2 infections. The GO functional enrichment analysis of the proposed hub-DEGs/proteins and all the DEGs reflected the significant molecular functions that are highly linked with the COVID-19 infection and proliferation in host cell (Table 3 and Supplementary Files S1 and S2). Among the enriched MFs, the lipopolysaccharide (LPS) immune receptor activity driven by TLR2 hub-DEG is associated with the LPS-induced production of pro-inflammatory cytokines reduction, inflammation by affecting the lungs LPS due to the COVID-19 infection [56][57][58] . The most important and significant functions namely, cytokine regulation, produces the unnecessary "cytokine storm" that promote the adverse events like alveolar damage and fibrosis 59,60 on COVID-19 patients. The interleukin (IL) regulatory pathways are crucial for the important pathophysiological mechanisms called systemic inflammation and cytokine release syndrome [61][62][63] which were significantly associated with the hub genes (Supplementary File S1). The C-C chemokine binding functions driven by ZFP36 hub-DEG are directly involved with the T-cell induced pathogen burden controlling which is also an important receptor group protein for COVID-19 [63][64][65] . The NAD + nucleotidase MF (steered by TLR2 hub-DEG) was found to have protective roles, and mitigate the disease severity if administered prophylactically, and its anti-hyper inflammation properties 66,67 and the Dcp1-Dcp2 complex (steered by ZFP36 hub-DEG) play a positive role in viral infection 68 . The above enrichment analysis noticeably focuses on the association of the identified hub proteins with the diverse significant functions that are crucial for COVID-19. Moreover, the functional enrichment and pathway analysis of all the identified DEGs were recorded, and it was found that the functional pathways enriched by the key DEGs were also enriched by all the DEGs significantly (Supplementary File S1 and S2). The common functional pathways enrichment showed the biological uniformity characteristics among the all identified DEGs and the hub-DEGs.
The KEGG pathway analysis of the proposed hub-DEGs/proteins showed some enriched significant pathways. The top 10 significant KEGG pathways included the legionellosis related pathway, IL-17 signaling pathway, Rheumatoid arthritis pathway, PI3K-Akt signaling pathway, Kaposi's sarcoma-associated herpesvirus infection and the proteoglycans in cancer pathways (Fig. 4). One of the most important pathways driven by TLR2 and CXCL2 hub-DEG namely, the legionellosis which is a typical pneumonia and exposes the cough, shortness of breath, high fever, muscle pains, and headaches 69 . These symptoms are also highly related and most common symptoms for COVID-19 positive patient 70 . During this pandemic situation, the emergency COVID-19 positive patients were permitted to treat with the Hydroxychloroquine (HCQ) although its molecular mechanisms were not completely known and later on WHO advised to avoid this for the treatment. The HCQ -is also commonly used in rheumatic disease treatment and it has been shown that the patients with rheumatoid arthritis (RA) represent lower risk of COVID-19 infection 64 . In our analysis, the hub-DEGs TLR2 and CXCL2 significantly enriched the rheumatoid arthritis pathway which indicates that these genes may have the antagonized property against the COVID-19 infection. The other significant pathways are Kaposi's sarcoma-associated herpesvirus infection (steered by ZFP36 and CXCL2), PI3K-Akt signaling pathway and the proteoglycans in cancer pathways (steered by IGF2 and TLR2). The Kaposi's sarcoma-associated herpesvirus infection is associated with the lung infection 71 , and the PI3K-Akt signaling pathway mainly works with the cell cycle and also with the various proteins function 72,73 . On the other hand, the proteoglycans in cancer pathways is treated as an important cancer related pathway in human 74 . Therefore, based on the molecular pathway enrichment analysis, it can be presumed that the proposed hub proteins may have significant roles in SARS-CoV-2 infection and proliferation and may be treated as prominent therapeutic target.
The TFs versus Hub-DEGs interaction network analysis revealed 6 key TFs-proteins (FOXC1, GATA2, SRF, FOXL1, YY1 and NFIC) as the transcriptional regulatory factors of hub-DEGs (Fig. 5). The basal-like breast cancer (BLBC), Alzheimer's disease, tissue invasion are highly associated with the FOXC1 protein 75,76 . The GATA2 protein is associated with breast and kidney cancer related pathway, when the higher expression pattern of YY1 protein increases the tumour size, higher TNM stage [77][78][79] , the FOXL1 TF are related with proliferation, cell-cycle 80 . The SRF protein is associated with the regulation of cell survival and cell cycle progression in cardiac fibroblasts 81 . The NFIC protein has greater involvement with the tumor genesis of breast cancer, gastric cancer, and glioma [82][83][84] . Also the identified TFs proteins has a significant involvement in various biological functions and pathways [19][20][21][22]85 . The miR-107, miR-16-5p, miR-103a-3p, miR-27a-3p, miR-155-5p and miR-1-3p were identified as the key post transcriptional regulatory factors of hub-DEGs (Fig. 6). The miR-107 (microRNA) has direct interaction with the Coxsackie B3 virus (CVB3) replication and release 86 . The miR-16-5p represented higher expression pattern in lung cancer cell 87 and the miR-103a-3p and miR-27a-3p has a positive correlation with the renal inflammatory dysfunction, cell proliferation and apoptosis 88,89 while the miR-155-5p is associated with breast cancer 90 and the miR-1-3p has the probable interaction with the SARS-CoV2 91 . The above discussion gives the evidence that the proposed regulatory TFs and the miRNAs have an enormous correlation with various biological functions and processes that are closely connected with the symptoms of SARS-CoV-2 infections and proliferation process.
The diseases versus hub-DEGs interaction network analysis showed that the predicted hub-DEGs are associated with various types of cancers and other complex diseases including the respiratory cases (Fig. 7, Supplementary Table S2). The IGF2 was connected with maximum number of diseases in the network followed by the other hub-DEGs. We observed that IGF2 gene is associated with 120 diseases including Cardiovascular Diseases, Colorectal Neoplasms, Cardiomyopathies, Liver carcinoma, Anemia; the CXCL2 was associated with 19 diseases including Rheumatoid Arthritis, Heart failure, Hypertensive disease, Inflammation, Pulmonary Fibrosis, Acute Lung Injury while the ZFP36, KLF6, GUCY1A2 and PAG1 was linked with 18 diseases including Liver Cirrhosis, Experimental Prostatic Neoplasms, Stomach Carcinoma, Inflammation, Arthritis, especially which could be a severe comorbidities of the COVID-19 patients. Among the associated diseases six diseases were connected with two hub-DEGs, notably, inflammation, mammary neoplasms, myocardial ischemia, colorectal neoplasms, somatic mutation, schizophrenia. The inflammation is considered as vital COVID-19 related comorbidity while www.nature.com/scientificreports/ others are also crucial. The hub-DEGs those are related with the above comorbidities also play a significant role for these diseases during the COVID-19 affection. The association of hub-DEGs with several diseases is also supported by the literature review. For example, the hub-protein SNRPD2 is significantly associated with histologic grade in Hepatocellular carcinoma (HCC), mild cognitive impairment (MCI) and Alzheimer's disease (AD) 92,93 . The hub-protein ZFP36 is associated with breast cancer and tumor-suppressive actions during hepatic tumor progression 94,95 . The hub-protein NEDD9 is significantly associated with head and neck and lung cancers 96,97 . The hub-protein KLF6 has a direct involvement in ovarian cancer cell proliferation and metastasis promotion and also works as a critical regulator of pathogenic myeloid cell activation in human 98,99 . The USP53 genes has a greater involvement in cholestatic liver disease 100,101 and the IGF2 proteins are widely known as the diabetes associated protein and can control the insulin secretion in β-cells during fasting 102 . However, many genes/proteins related to lung disease including cancer is highly interacting with SARS-CoV-2 infections, since the patients suffer from the major complexities when the virus infects the lung. The idiopathic pulmonary fibrosis (IPF) is treated as one of the most crucial and serious risk factors of COVID-19 103 , since the COVID-19 positive patients have a greater chance for being enhanced with the IPF which creates numerous complications and leads a high risk to recover from COVID-19 104,105 . The TLR2 is highly responsive in immune-enhancing activity 106 and the Type 2 lung inflammation is associated with the PAG1 gene expression 107 .
The multivariate survival analysis for lung cancer patients is based on low and high expressions of hub-DEGs significantly differentiated the survival curves (Fig. 7b), which indicates that the lung cancer is significantly influenced by the hub-DEGs of SARS-CoV-2 infection. Also, the patients with the lung cancers belong to the high risk of mortality from COVID-19 infection. The above discussion indicates that the proposed genomic biomarkers responsible for SARS-CoV-2 infection are also associated with various comorbidities including diabetes, lung diseases, and respiratory disease and immune systems. Therefore, covid-19 patients usually suffer from multiple complexities and reach to the severe condition that has complex comorbidities [108][109][110] .
To explore effective drugs for the treatments against SARS-CoV-2 infections with comorbidities, we considered the proposed human genomic biomarkers guided 10 hub-proteins (SNRPD2, ZFP36, NEDD9, KLF6, USP53, IGF2, TLR2, PAG1, GUCY1A2, and CXCL2) and 5 key TFs proteins (FOXC1, GATA2, SRF, FOXL1, and YY1) as the drug target receptor proteins and performed their docking analysis with the SARS-CoV-2 3CL protease-guided top ranked 90 FDA approved repurposable drugs. Then we selected top ranked 7 drugs (Torin-2, Rapamycin, Radotinib, Ivermectin, Thiostrepton, Tacrolimus, and Daclatasvir) as the candidate drugs for the treatment against SARS-CoV-2 infection, where the first two drugs showed strong binding affinities with all the target proteins (Fig. 8b). Among the identified candidate drugs, the Ivermectin and the Rapamycin were used to treat the COVID-19 affected patients although it has a lack of wide range of information about their activity against the SARS-CoV-2 virus 111 . Since Ivermectin has a potential antiviral activity, it has been used in the treatment of various virus infection including the SARS-CoV-2 treatment by dosing solely or with a combining other drugs 112,113 . Moreover, many studies suggested to use the Ivermectin as a potential therapeutic for COVID-19 114,115 . On the other hand, Rapamycin is widely used as inhibitor of protein synthesis and constrains the expression of pro-inflammatory cytokines such as IL-2, IL-6 and, IL-10 116 , therefore it has been also given for COVID-19 treatment 116,117 . Rapamycin can also interact with the spike protein of the SARS-CoV-2 and work in mTOR pathway inhibitors [118][119][120][121] . This result is indicating the consistency of therapeutic potentiality of the proposed hub proteins and TFs for the COVID-19 treatment. The remaining drugs were found as new potential drug candidates based on their binding affinity with the hub proteins and TFs. The Torin-2 is considered as a kinase inhibitor which worked in the PI3K-Akt/mTOR signaling pathway 117,119,122 , which supports our previous pathway analysis for the hub proteins. Radotinib (IY-5511) were being prescribed for the chronic myeloid leukaemia (CML) 123,124 whereas Thiostrepton was used for acute kidney injury treatment 125 . The evidences show that the Tacrolimus has a positive inhibitory impact on the COVID-19 patients with comorbidities like kidney and liver transplantation 126,127 . Moreover, we validated these seven candidate drugs against the state-of-the-art alternatives already published top-ranked 11 independent and 8 protonated receptors and found their strong binding affinities (Fig. 8c), which indicates that the proposed drugs are effective against the state-of-the-arts alternatives SARS-CoV-2 infection causing independent receptor proteins also. Finally, we examined the stability of top-ranked three drugs (Torin-2, Rapamycin and Radotinib) by using 100 ns MD-based MM-PBSA simulations for two top ranked proposed (TLR2, USP53) and independent (IRF7, STAT1) receptors, and observed their stable performance according to the laws of physics 128,129 . Therefore, the proposed candidate drugs might play the vital role for the treatment against different variants of SARS-CoV-2 infections with comorbidities since our proposed target proteins are also associated with several comorbidities. The present study emphasises the further wet lab experimental validation for both the proposed target proteins and candidate drugs.

Conclusion
The present study aims to explore genomic biomarkers (drug targets) for SARS-CoV-2 infections highlighting their functions, pathways, regulatory factors, associated comorbidities and candidate drugs. To achieve the goal, at first 109 DEGs between SARS-CoV-2 and control sample were detected from RNA-Seq profiles. The top ranked 10 hub-DEGs/proteins (TLR2, USP53, GUCY1A2, SNRPD2, NEDD9, IGF2, CXCL2, KLF6, PAG1 and ZFP36) were identified as genomic biomarkers by the PPI network analysis of DEGs with the NetworkAnalyst tool and STRING database. Gene-set enrichment analysis (GSEA) through the GO functional and KEGG pathway was then conducted to predict the associated functions and pathways of these genomic biomarkers. The gene regulatory network (GRN) analysis identified top ranked 5 TFs proteins (FOXC1, GATA2, SRF, FOXL1 and YY1) and 6 miRNAs (miR-107, miR-16-5p, miR-103a-3p, miR-27a-3p, miR-155-5p and miR-1-3p) as the transcriptional and post-transcriptional factors, respectively. The diseases versus genomic biomarkers interaction network analysis, survival analysis of lung cancer patients with genomic biomarkers and literature review showed that our www.nature.com/scientificreports/ proposed genomic biomarkers are associated with various types of comorbidities including diabetes, lung and heart diseases, respiratory disease and immune systems. Then we considered 10 hub-proteins (proposed genomic biomarkers) and 5 key TFs-proteins as the drug target receptor proteins to explore effective drugs by molecular docking analysis with the SARS-CoV-2 3CL protease-guided top 90 FDA approved anti-viral drugs. Then we selected top ranked 7 candidate drugs (Torin-2, Rapamycin, Radotinib, Ivermectin, Thiostrepton, Tacrolimus and Daclatasvir) with respect to our proposed 15 target proteins for the treatment against SARS-CoV-2 infection. Then we investigated the resistance of our proposed candidate drugs against the state-of-the-art alternatives of recently published top ranked 11 independent receptors for SARS-CoV-2 infections and observed that our proposed drugs are also effective against those independent receptors. Finally, we investigated the stability performance of top three drugs (Torin-2, Rapamycin and Radotinib) by using 100 ns MD-based MM-PBSA simulations for two top ranked proposed receptors (TLR2, USP53) and top two independent receptors (IRF7, STAT1), and observed their stable performance. In the context of already published host transcriptome-guided candidate drugs for covid-19, so far, no researchers yet investigated the resistance of their suggested drugs against the state-of-the-art alternatives independent receptors proposed by others computationally. In this study, we considered this issue. Thus, we may sate that this study is partially unique. As covid-19 is a new coronavirus disease, there has been little research on exploring globally effective drugs. In this regard, this research on coronavirus disease might open up new possibilities to explore globally more effective drugs computationally.

Materials and methods
Data sources and descriptions. We used both original data and metadata associated with SARS-CoV-2 infections to reach the goal of this study as described below.

Collection of RNA-Seq profiles as original data (case/control). We collected the original host
RNA-Seq count dataset to explore genomic biomarkers and drug target key receptor proteins associated with SARS-CoV-2 infection. This dataset was downloaded from the NCBI Gene Expression Omnibus (GEO) data repository with the accession number GSE150316. This dataset consisted of 88 samples generated from different organs including lung, heart, jejunum, liver, kidney, bowel, fat, skin, bone marrow and placenta, where 5 samples were COVID-19 negative (control). The count of each sample was generated on 59,091 transcripts. This dataset was first analyzed by Desai et al. 130 to investigate the temporal and spatial heterogeneity of host genome response to SARS-CoV-2 pulmonary infection. In our case, we considered only lung tissue samples to avoid the spatial heterogeneity problem from the dataset. Our analyzed original dataset consisted of 35 lung tissue samples infected with SARS-CoV-2 (case) and 5 control samples.

Collection of drug agents and receptor proteins as metadata. We collected SARS-CoV-2 3CL
protease-guided top listed 90 drugs out of 3410 FDA approved antiviral drugs published by Beck et al. 50 as the meta drug agents to explore few top ranked host transcriptome-guided drugs against SARS-CoV-2 infections by molecular docking with our proposed receptor proteins. The 3D structures of 90 FDA-approved drugs (Supplementary Table S1) were downloaded from PubChem database 54 . To validate our proposed host transcriptomeguided repurposed drugs by molecular docking with the top listed receptor proteins that were published in different reputed journals, we reviewed 22 different articles infections 28-49 associated with SARS-CoV-2 infections and selected 11 top listed receptor proteins out of 193. Then their docking performance with our proposed drugs were investigated whether the drugs are keeping consistency with high binding affinities as in our proposed drug target hub-DEGs.

Integrated bioinformatics and system biology analyses. The integrated bioinformatics and system
biology approaches were utilized in this study as described below:

Identification of differentially expressed genes (DEGs). Identification of differentially expressed
genes (DEG) is one the most important tasks in this study. There are several methods for identification of DEGs from RNA-Seq profiles. Most of them are sensitive to outlying observations. There are a few robust approaches for identification of DEGs from RNA-Seq profiles. However, non-robust approaches are slightly better than robust approaches in absence of outliers, while the robust approaches are much better than the classical approaches in presence of outliers. Therefore, we considered edgeR 131 as a popular non-robust and DESeq2 132 as a popular robust approaches to take their advantages in our analyses, since few RNA-Seq counts are often contaminated by outliers due to several steps involves in the data generating process. However, normalization is a compulsory step for RNA-Seq data analysis. It removes systematic technical bias from the data and makes the samples comparable. The edgeR approach utilizes TMM (trimmed mean of M values) normalization, whereas the DESeq2 utilizes the median-of-ratios method. The edgeR method was formulated based on generalized linear model (GLM) of the negative binomial family. It assumes negative binomial (NB) distribution for the read counts and uses the empirical Bayes for squeezing the tag-wise dispersions toward common dispersion, whereas DESeq2 also considered GLM of the NB family and assumes NB distribution for the read counts and uses the empirical Bayes to shrink gene-wise dispersion estimates towards fitted values. The edgeR approach utilizes the likelihood ratio test (LRT) statistic to calculate the p-values and test the null hypothesis of no differential read counts between case and control groups, whereas the DESeq2 utilizes the Wald test statistic to calculate the p-values and test the null hypothesis of no differential read counts between case and control groups. The p-values of both the edgeR and the DESeq2 approaches are then adjusted for multiple testing using the procedure of Benjamini and Hochberg 133 . www.nature.com/scientificreports/ In this paper, we considered the gth gene (g = 1,2, …, 59,091) as a differentially expressed gene (DEG) between case and control groups if its adjusted p g -value < 0.05 along with |log 2 (aFC g )|> 1 by controlling the false discovery rate (FDR) at 5%, otherwise, it was considered as equally expressed gene (EEG). The gth gene was considered as upregulated or downregulated DEG if its adjusted p g -value < 0.05 along with log 2 (aFC g ) > 1 or log 2 (aFC g ) < − 1, respectively. Here aFC is the abbreviation of average fold change which is defined as aFC g = x g /y g (the fold change of x g with respect to y g ), where x g and y g are the averages of normalized counts of case and control groups with respect to gth gene, respectively. For example, a change from y g = 3 to x g = 9 produces aFA g = 3 which is referred to as a "threefold upregulated in average". Similarly, a change from y g = 9 to x g = 3 produces aFA g = 1/3 which is referred to as a "threefold downregulated in average". Let A UR and A DR were the upregulated and downregulated DEGs-sets respectively, detected by edgeR. Again, let B UR and B DR were the upregulated and down-regulated DEGs-sets respectively, detected by DESeq2. Then we defined upregulated gene-set as (A UR ∪ B UR ) and down-regulated as (A DR ∪ B DR ) by combining the results DEGs results of edgR and DESeq2. Let C was the set of contradictory upregulated and downregulated DEGs estimated by both edgeR and DESeq2. For example, if a DEG g ∈ A UR but g / ∈ B UR or g ∈ A DR but g / ∈ B DR , then the DEG 'g' was considered as a contradictory DEG. We removed this type of contradictory DEGs from further analysis. Then the final DEGs-set for further analysis was defined as where, upregulated DEGs-set was defined as and the downregulated DEGs-set was defined as Functional and pathway enrichment analysis of Hub-DEGs. Gene ontology (GO) functional and Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment/annotation/over-representation analysis 134 is a widely used approach to determine the significantly annotated/enriched/over-represented functions/classes/terms (biological processes (BP), molecular functions (MF) and cellular components (CC)) and pathways by the identified DEGs/Hub-DEGs. The BP is a change or complex of changes during the granularity period of the cell that is mediated by one or more gene products for different biological objectives. The MFs are the biochemical activities of gene products. The CC is a place in a cell in which a gene product is active. KEGG pathway is a collection of experimentally validated pathway maps [135][136][137] representing our knowledge of the molecular interaction, reaction and relation networks for metabolism, cellular processes, genetic information processing, organismal systems, environmental information processing, human diseases and drug development. Let S i is the annotated gene-set corresponding to ith type of biological functions or pathways given in the database and M i is the number of genes in S i (i = 1, 2,…,r); N is the total number of annotated genes those construct the entire combine set where S c i is the complement set of S i . Again, let n is the total number of DEGs of interest and k i is the number of DEGs belonging to the annotated gene-set S i . This problem is summarized by the following contingency table (Table 1): The probability of observing exactly k i DEGs in S i (ith functional/pathway annotated gene-set) out of n DEGs can be modeled by the hypergeometric distribution. Hence, the probability of observing k i or more genes in S i out of n DEGs can be calculated by the cumulative probability as follows The subset of DEGs belonging to S i is said to be significantly enriched if its adjusted p-value (p i ) is less than 0.05 by controlling the FDR at 5%. The g:GOSt core, a free web tool for functional analyses (https:// biit. cs. ut. ee/ gprofi ler/ gost), embedded with the g:Profiler web server 138 utilizes the cumulative hypergeometric approach to calculate these p-values (p i' s) and their adjusted p-values for multiple testing using the procedure of Benjamini and Hochberg 133 . The g:Profiler is a regularly updated database for performing GO functional and KEGG pathway enrichment analysis. Therefore, in this paper, we performed functional and pathway enrichment analysis by using the g:GOSt tool entrenched in the g:Profiler web server to disclose the statistically significant GO terms of biological processes, molecular functions and cellular components and KEGG pathways for DEGs associated with the SARS-CoV-2 infections.   are the physical magnetism of two or more protein molecules that occur due to biochemical reactions steered by hydrogen bonding, electrostatic forces and the hydrophobic effect. Generally, a protein cannot work without interaction with one or more other proteins. The PPI network contributes to the formation of larger protein complexes for performing a specific task 139 . It carries out many molecular functions and biological processes including protein function, cell-to-cell interactions, metabolic and developmental control, disease incidence, and therapy design. A PPI network is represented as an undirected graph, where nodes and edges indicate proteins and their interactions, respectively. A node having the largest number of significant interactions/connections/edges with other nodes is considered as the top ranked hub-protein. Therefore, the PPI network analyses of DEGs are now widely using to explore hub-DEGs/proteins. In this study, the PPI network of DEGs was constructed by using the NetworkAnalyst 140 with the STRING database 141 plagued in Cytoscape. A topological exploration based on dual-metric measurements degree (> 10) and betweenness were utilized to determine the highly representative DEGs/proteins those are also known as hub-DEGs/proteins.

Regulatory network analysis of hub-DEGs.
A gene regulatory network (GRN) shows molecular regulators that interact with each other to control the gene expression levels of mRNA and proteins. Transcription factors (TFs) and microRNAs (miRNAs) are considered as the most important molecular regulators of genes. A transcription factor (TF) is a protein that binds to a specific DNA region (promoter/enhancer) and regulates gene expression by promoting or suppressing transcription. TFs are considered as the main players in GRN. A miRNA is a small single-stranded non-coding RNA molecule (containing about 22 nucleotides) that works in RNA silencing and post-transcriptional regulation of gene expression. There are up to 1600 TFs and 1900 miR-NAs in the human genome. A TFs and hub-DEGs/proteins interaction network is considered as an undirected graph, where nodes indicate TFs or hub-DEGs and edges represents interactions between TFs and hub-DEGs, respectively. A TF-node having the largest number of significant interactions/connections/edges with hub-DEGs nodes is considered as the top ranked hub-TF regulator of hub-DEGs. To explore hub-TFs of hub-DEGs, we contracted the interaction network between TFs and hub-DEGs by using the NetworkAnalyst tool 140 based on JASPAR database 142 . Similarly, miRNA and hub-DEGs interaction network was constructed through the Net-workAnalyst based on TarBase V8.0 143 database to identify the key regulatory miRNAs for hub-DEGs. These key regulatory biomolecules were selected based on the highest topological matrices (degree of connectivity and betweenness) applied on the interaction network.

Association of hub-DEGs with comorbidities.
To investigate the association of hub-DEGs with other diseases, we performed diseases versus hub-DEGs interaction network analysis by using the NetworkAnalyst tool 140 based on DisGeNET 144 database. We also performed survival analysis based on the expression of hub-DEGs with lung cancer patients by using Kaplan-Meier (KM) plotter 145 to investigate the association of hub-DEGs with lung cancer, since SARS-CoV-2 samples were collected from lung tissue and also it affects the lung mostly. The KM plotter utilizes the log rank statistic to test the significance of association.
Drug repurposing by molecular docking study. To propose in-silico validated efficient FDA approved repurposed drugs for the treatment of SARS-CoV-2 infections, we employed molecular docking between the target receptor proteins and drug agents. We considered our proposed hub-DEGs based hub-proteins and associated TFs proteins as the drug target receptor proteins and SARS-CoV-2 transcriptome-guided top listed 90 drugs out of 3410 FDA approved antiviral drugs published by Beck et.al. 2020 50 as drug agents or ligands for docking analysis. The molecular docking study requires 3-Dimensional (3D) structures of both receptor proteins and candidate drugs. We downloaded the 3D structure of all targeted receptor proteins from Protein Data Bank (PDB) 51 and SWISS-MODEL 52 , a homology modeling based database. The 3D structures of our selected 90 drug agents (Supplementary Table S1) were downloaded from PubChem database 54 . The 3D structure of the target proteins was visualized using Discovery Studio Visualizer 2019 146 and the water molecules, co-crystal ligands which were bound to the protein were removed. Further, the protein was prepared using USCF Chimera and Autodock vina 147 in PyRx open source software by adding charges and minimizing the energy of the protein and subsequently converting it to pdbqt format [147][148][149] . The exhaustiveness parameter was set to 8. The Protein-Ligand Interaction Profiler (PLIP) web service 150 and PyMol 151 were used to analyze the docked complexes for surface complexes, types and distances of non-covalent bonds. Let A ij denotes the binding affinity between ith target protein (i = 1, 2, … , m) and jth drug agent (j = 1, 2, … , n). Then target proteins are ordered according to the descending order of row sums n j=1 A ij , j = 1,2, … , m, and drug agents are ordered according to the descending order of column sums m i=1 A ij , j = 1,2, … , n, to select the top ranking few drug agents as the candidate drugs. The average binding affinity score less than or equal to − 8.0 was considered as the better drug selection criterion against the receptor proteins. Then we validated the proposed repurposed drugs by molecular docking with the top listed receptor proteins associated with SARS-CoV-2 infections that were obtained by the literature review. To select the top listed receptor proteins associated with SARS-CoV-2 infections, we reviewed 22 recently published articles  and selected the top listed 11 receptor proteins. Molecular dynamic (MD) simulations. MD simulations were carried out by using YASARA Dynamics software 152 , and the AMBER14 force field 153 to study the dynamic behavior of the top-ranked protein-ligand complexes. A total of six different systems were used to run MD simulation. The systems included top three hits, TLR2_Torin-2, TLR2_Rapamycin, USP53_Radotinib complexes corresponding to our proposed receptors and another three hits, IRF7_Torin-2, STAT1_Rapamycin, STAT1_Radotinib complexes from top listed independent receptors. Before simulation, the target-drug complex's hydrogen bonding network was optimized and www.nature.com/scientificreports/ solvated by a TIP3P 154 water model in a simulation cell. Periodic boundary conditions were maintained with a solvent density of 0.997 g L −1 . Titratable amino acids in the protein complex were subjected to pKa calculation during solvation. The initial energy minimization process of each simulation system, consisting of 41,645 ± 10, 41,645 ± 10, 95,924, 105,924 ± 10, 105,924 ± 10 and 85,924 atoms for TLR2_Torin-2, TLR2_Rapamycin, USP53_ Radotinib, STAT1_Rapamycin, STAT1_Radotinib and IRF7_Torin-2 complexes was performed by a simulated annealing method respectively, using the steepest gradient approach (5000 cycles). Each simulation was run with a multiple time step algorithm 155 , using a time-step interval of 2.50 fs under physiological conditions (298 K, pH 7.4, 0.9% NaCl) 156 . All bond lengths were constrained using the linear constraint solver (LINCS) 157 algorithm, and SETTLE 158 was used for water molecules. Long-range electrostatic interactions were described by the PME 159 methods, and, finally, 100 ns MD simulation was accomplished at Berendsen thermostat 160 and constant pressure. The trajectories were recorded every 250 ps for further analysis, and subsequent analysis was implemented by default script of YASARA 161 macro and SciDAVis free software available at http:// scida vis. sourc eforge. net/. All snapshots were then subjected to YASARA software's MM-PBSA (MM-Poisson-Boltzmann surface area) binding free energy calculation using the formula below 162 , Here, MM-PBSA binding energy was calculated using YASARA built-in macros using AMBER 14 as a force field, with larger positive energies indicating better binding 163 . The PBSA is one of the most appealing solvation systems used in computer-aided drug design techniques to determine binding energy of protein-drug complexes 164,165 .